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Abstract. TLA + is a language intended for the high-level specification of reactive, 
distributed, and in particular asynchronous systems. Combining the linear-time 
temporal logic TLA and classical set-theory, it provides an expressive specification 
formalism and supports assertional verification. 

1 A TASTE OF TLA+ 

The specification language TLA + has been introduced by Leslie Lamport [22] for 
the description of reactive and distributed, especially asynchronous systems. In 
this paper, I describe the semantical base of TLA + , which combines the linear- 
time temporal logic TLA and Zermelo-Frankel set theory. My intention is not to 
define a new or extended formalism nor to explain the use of TLA + in practice. 
Lamport's original work covers much more material than this paper. In particular, 
his recent book [27] includes a tutorial introduction to writing specifications in 
TLA 4 ", formally defines the language of TLA + , and describes the tools that support 
it. In contrast, this presentation of TLA + emphasizes the mathematical machinery 
underlying TLA + , explaining Lamport's choices from a logical perspective. It is 
my hope that it will find some use for purposes such as comparing specification 
formalisms or for constructing new tools to support system development in TLA+ . 

Before we begin exploring the semantics of TLA + , let us have a look at a simple 
example that introduces the typical structure of a TLA + specification. The TLA + 
module SyncQueuclnternal, shown in figure 1(b), describes an unbounded FIFO 
queue, which is illustrated in figure 1(a). The external interface consists of an input 
channel in and an output channel out. Internally, the FIFO maintains a queue q of 
values that have been received via in but have not yet been sent via out. 

The modide consists of three sections, separated by horizontal bars for better 
readability, that contain declarations, definitions, and assertions. This structure of 
a module is conventional, but not mandatory: formally, a module is simply a list of 
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(a) Pictorial representation. 



MODULE SyncQueuelntemal - 



EXTENDS Sequences 
CONSTANT Message 
VARIABLES in, out, q 



NoMsg 


A 


CHOOSE x : x £ Message 


Init 


A 


q = {) A in = NoMsg A out = NoMsg 


Enq(in) 


_A 


A in ^ m 






A in' = m A </' = Append{q, m) 






A out' = out 


Deq 


A_ 








A out' = Head(q) A 9' = Tai/(o) 






A in' = in 


Next 


A_ 


(3 m 6 Message : Enq(m)) V Z)eo 


vars 




(in, out, o) 


, Fifol 


A_ 


/nit A □[J\fet] uars A WF Vflra (Deg) 



THEOREM 

Fi/o/ => A € Seq(Message)) 

A 0[£)eq out' ^ out] vars 

A V m € Message : in = m out = m 



(b) TLA + spccificntion with the internal bchnvior exposed. 



MODULE SyncQueue 

CONSTANT Message 
VARIABLES in, out 

Internal(q) = INSTANCE SyncQueuelntemal 



Fifo = 3 q : Internal(q)\FifoI 



(c) TLA + interface specification. 
Fig. 1. A FIFO queue with synchronous communication. 
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statements. Any identifier must have been declared or defined exactly once (possibly 
in an imported module) before it is used. 

The first section declares SyncQueuelnternal to be based on the standard TLA + 
module Sequences, which defines finite sequences and associated operations. Next, 
we find a declaration of the module parameters. The constant parameter Message 
intuitively represents the set of messages that are to be sent via the FIFO queue. 
The variable parameters in, out, and q represent the current state of the queue as 
shown in figure 1(a); their values will change as messages are received and forwarded. 

The second section contains a list of definitions, which constitute the main body 
of the specification. The constant NoMsg is defined to equal some value that is not 
an element of the set Message (section 4 explains why this definition is sensible). 
The state predicate Init identifies legal initial states of the specification: the value 
of g should be the empty sequence (), while both in and out should equal the value 
NoMsg. For any value m, the formula Enq(m) characterizes state transitions that 
correspond to an "enqueue" action 1 : we require m to be different from the current 
value of in so that the queue can recognize that the input channel has changed. 
(This condition is not essential, but is introduced mainly for expository purposes. 
An implementation could for example instantiate the parameter Message by a set of 
pairs consisting of the underlying data and an extra bit, which serves to distinguish 
two successive enqueue actions for the same data.) The value of the variable in 
at the state following the transition, denoted by in 1 , will be m, and the new value 
of q is obtained by appending m at the end of whatever value q contains before 
the transition. Finally, we stipulate that the output channel out should not change 
during an enqueue action. The definition of the dequeue action Deq is similar. The 
action Next is defined as the disjunction of all enqueue actions Enq(m), for m in 
Message, and of the dequeue action Deq. 

The main definition of module SyncQueuelnternal is that of the temporal for- 
mula Fifol, representing the "internal" specification of the FIFO queue. It is written 
as a conjunction: the first conjunct Init asserts that the first state of any behavior 
satisfying Fifol must respect the initial condition. The second conjunct specifies 
the next-state relation of the queue. More precisely, it asserts that every transi- 
tion allowed by Fifol must either respect the formula Next or leave the expression 
vars unchanged; the latter is defined as the tuple (in, out, q) containing the state 
variables of interest. Because the value of a tuple is unchanged if and only if all 
its components are unchanged, this formula admits "stuttering steps" that do not 
affect the variables of interest. In a larger system that contains the FIFO queue 
as a component, such steps may represent actions of different components. The 
final conjunct of formula Fifol asserts a condition of weak fairness concerning the 
action Deq. It rules out behaviors where from some state onward, the Deq action 
is always enabled, but never occurs. Section 2 explores in more detail the temporal 



1 In this formula and throughout the paper, we use a standard TLA + notation that 
displays multi-line conjunctions and disj unctions as a list "bulleted" by the connective. 
This layout makes long formulas easier to read and reduces the number of parentheses. 
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logic TLA that underlies TLA + , and discusses the fundamental concept of stuttering 
invariance. 

The third section of module SyncQueuelntemal asserts a theorem: it claims 
that the formula shown follows from the definitions. (In general, a module may state 
assumptions about constants, and theorems should then follow from the definitions 
and the stated or imported assumptions.) In a loose reading, the assertion of a 
theorem in a module can be regarded merely as a comment that highlights the 
specifier's intuitions. Formally, however . a theorem represents a proof obligation that 
must be discharged for the module to be correct, and we will turn to proof rules for 
verification in section 3. The theorem asserted of module SyncQueuelntemal states 
that every behavior that satisfies formula Fifol has the following properties: 

• at every state, the value of the variable q is a finite sequence whose elements are 
contained in Message, 

• every Deq step changes the value of the output channel out, and 

• every message that appears on the input channel will eventually be output. 

The current version of TLA + described in [27] does not contain a language for 
writing proofs, although Lamport advocates a hierarchical proof notation [26] . 

Like HOL [16] and other logical specification languages, TLA + is declarative: 
the names of formulas such as Init, Next or Fifol are formally irrelevant, although 
it is good practice to make them meaningful. The meaning of a formula can always 
be uniquely and compositionally determined from the meaning of its subformulas, 
and how to do this is the main subject of the present paper. As in any logic, there 
are many logically equivalent ways to express a specification. For example, we could 
have replaced the definitions of Enq and Next by 

Enq = A in' 6 Message A in' ^ in 

A q f =r Append(q, in') A out* = out 

Next == Enq V Deq 

without changing the meaning of formula Fifol. 

TLA+ does not hide the complexity of a system by using built-in data types; as 
we will see in section 4, every value is just a set. Similarly, it does not presuppose 
any fixed system model such as shared-variable or message-passing concurrency, 
synchronous or asynchronous communication, etc. Its expressiveness comes from 
the unfettered use of set theory and the mechanism of definitional extension. For 
our example, we have chosen the internal variable q to change at the same time as the 
interface variables in and out, representing a synchronous style of communication. 
A specification of a FIFO queue using asynchronous communication channels is 
presented in Lamport's book [27, ch. 4]. 

The specifications of module SyncQueuelntemal describe the behavior of the 
FIFO queue in terms of the three variables in, out, and q. One important principle 
in writing specifications is that of information hiding, which requires a component 
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specification not to reveal the internal structure (or "implementation details") of 
the component. In our example, the variable q is such an implementation detail: 
as illustrated by the box in figure 1(a), only the behavior of the externally visi- 
ble variables in and out should be constrained by the queue specification. Module 
SyncQueue, shown in figure 1(c), contains an interface specification of the FIFO 
queue based on the previous specification. In fact, it declares in and out as its only 
variable parameters. The following line instantiates the previously discussed module 
SyncQueueJnternal: any operator Op defined in that module can be referenced as 
Internal (q)\ Op in module SyncQueue. The general form of instantiation in TLA + 
allows for substitution of expressions for module parameters; any remaining param- 
eters are implicitly instantiated with the identifier of the same name valid at the 
point of instantiation; it is an error if that identifier has not been declared or defined. 
In our case, the parameters Message, in, and out of module SyncQueuelnternal are 
instantiated by the corresponding parameters of module SyncQueue, whereas pa- 
rameter q is instantiated by the local parameter of the operator Internal. 

Module SyncQueue then defines the formula Fifo, representing the interface 
specification of the FIFO queue, as the formula obtained from Intemal(q)\ Fifol by 
existential quantification over q. This formula is satisfied by every behavior where 
in and out take the values as described by the internal specification, but where q 
may take arbitrary values. (The precise semantics is defined in section 2.4.) In this 
respect, existential quantification represents hiding of internal state components, 
and formula Fifo specifies the interface of the FIFO queue. 

2 TLA: THE TEMPORAL LOGIC OF ACTIONS 

TLA+ combines TLA, the Temporal Logic of Actions [25], and mathematical set 
theory. We now present the semantics of TLA, while sections 3 and 4 explore 
the verification of temporal formulas and the specification of data structures in set 
theory. Again, we emphasize that this exposition is aimed at a precise definition 
of TLA as a logical language; it does not attempt to explain how TLA is used to 
specify algorithms or systems. 

2.1 Rationale 

The logic of time has its origins in philosophy and linguistics, where it was in- 
tended to formalize temporal references in natural language [21, 34]. Around 1975, 
Pnueli [33] and others recognized that such logics could be useful as a basis for 
the semantics of computer programs. In particular, traditional formalisms based 
on pre- and post-conditions were found to be ill-suited for the description of re- 
active systems that are continuously interacting with their environment and that 
are not necessarily intended to terminate. Temporal logic, as it came to be called 
in computer science, offered an elegant framework to describe safety and liveness 
properties [10, 24] of reactive systems. Different dialects of temporal logic can be 
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distinguished according to the properties assumed of the underlying model of time 
(e.g., discrete or dense) and to the connectives offered to refer to different moments 
in time (e.g., future vs. past references). For computer science applications, the most 
controversial distinction has been between linear-time and branching- time logics. In 
the linear- time view, a system is identified with the set of its executions, modeled 
as infinite sequences of states, whereas the branching-time view also considers the 
branching structure of a system. Linear-time temporal logics, including TLA. suffice 
to formulate correctness properties that hold of all the runs of a system, whereas 
branching-time temporal logics can also express possibility properties such as the 
existence of a path, from every reachable state, to a "reset" state. The discussion of 
the relative merits and deficiencies of these two kinds of temporal logics is beyond 
the scope of this paper, but see, e.g., Vardi [38] for a recent contribution to this 
subject, with many references to earlier papers. 

Despite initial enthusiasm about the usefulness of temporal logic as a language 
to describe individual system properties, attempts to actually write complete system 
specifications in temporal logic revealed that not even a component as simple as a 
FIFO queue could be unambiguously specified [35] . This observation has led many 
researchers to propose that reactive systems should be modeled by some variant 
of state machines while temporal logic was retained as a high-level language to 
describe the correctness properties. A major breakthrough came with the insight 
that temporal logic properties are decidable for finite-state models, and such model 
checking techniques [13] are nowadays routine for the debugging of hardware circuits 
and communication protocols. 

Another weakness of standard temporal logic becomes apparent when one at- 
tempts to compare two specifications of the same system, written at different levels 
of abstraction. Specifically, atomic system actions are usually described via a "next- 
state" operator, but the "grain of atomicity" typically changes during refinement, 
complicating comparisons between specifications. For example, we might want to 
refine the specification of the FIFO queue of figure 1(b) such that the operation 
of appending an element to a queue is described as a sequence of more elementary 
assignments. 

TLA has been designed as a formalism where system specifications and their 
properties are expressed in the same language, and where refinement is reduced 
to elementary logic. The problems mentioned above are addressed in the follow- 
ing ways: "internal" specifications are written by defining their initial conditions 
and next-state relations, resembling the description of state machines, and are aug- 
mented by liveness and fairness conditions. Abstract ness in the sense of information 
hiding is ensured by quantification over state variables. The refinement problem is 
solved by systematically allowing for stuttering steps that do not change the val- 
ues of the state variables of interest; an implementation is allowed to refine such 
high-level stuttering into lower-level state changes. Similar ideas can be found in 
Back\s refinement calculus [11] and in more recent versions of Abrial's B method [9]. 
However, in order to prevent infinite stuttering, these formalisms require side con- 
ditions that are expressed in terms of well-founded orderings. Temporal logic can 
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state such requirements more abstractly in terms of high-level fairness conditions 
that must be preserved by a refinement, using any combination of fairness conditions 
and arguments based on well-founded orderings. 

Based on these concepts, TLA provides a unified logical language to express 
system specifications and their properties. A single set of logical rules is used for 
system verification and for proving refinement. 

2.2 Transition formulas 

The language of TLA is two-tiered: the base tier contains formulas that describe 
states and state transitions, whereas the top tier consists of temporal formulas that 
are evaluated over infinite sequences of states. In this section, we define the syntax 
and semantics of transition formulas, whereas the following sections will consider 
temporal formulas. Because transition formulas are just ordinary (untyped, first- 
order) predicate logic, this section can be quite brief. 

Assume a given signature of first-order predicate logic, consisting of: 

• at most denumerable sets Cp and Cp of function and predicate symbols, each 
symbol equipped with its arity, and 

• a denumerable set V of variables, partitioned into denumerable sets Vp and Vr 
of flexible and rigid variables. 

These sets should be disjoint from one another; moreover, no variable in V should 
be of the form v'. By Vf', we denote the set {V | v G Vp] of primed flexible 
variables, and by Vf, the union VU Vp* of primed and unprimed variable symbols. 

Transition functions and transition predicates (also called actions) are terms and 
formulas built from the symbols in Cp and Cp, and from the variables in Vf. For 
example, if / is a ternary function symbol, p is a unary predicate symbol, x G Vr, 
and v e Vp, then the term e defined as f(v,x, v') is a transition function, and the 
formula C defined as 3 v' : p(f(v, x, ?/)) A ->(v' = x) is an action. We omit a formal 
inductive definition of the syntax of transition functions and formulas. Collectively, 
we use the term transition formula to refer to transition functions and predicates. 

The semantics of transition formulas is also unsurprising. It is based on a first- 
order interpretation, which defines a universe of values and interprets each symbol 
in Lp by a function and each symbol in Cp by a relation of appropriate arities. 
In preparation for the semantics of temporal formulas, we distinguish between the 
valuations of flexible and rigid variables. A state is a mapping of the flexible variables 
in Vp to values of the universe. Given two states s and t and a valuation f of the 
rigid variables in Vr, we can define the valuation a 8tt £ of the variables in Vf as the 
mapping such that a 8it ^(x) = £(x) for x € Vr, a Mj $(v) = s(v) for v 6 Vf, and 
otg^tiv') = t(v) for v' € Vp>. The semantics of a transition function or transition 
formula E, written [E\* %t , is then simply the standard predicate logic semantics of 
E with respect to the extended valuation a Stt £. 
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We say that a transition predicate A is valid for the interpretation iff [j4J^ t is 
true for all states s, t and all valuations £. It is satisfiable iff lA^ 5t is true for some 
5, i, and f . Similarly, A is valid (satisfiable) for a class C of interpretations iff it is 
valid for all (satisfiable for some) interpretations in C. 

Finally, the notions of free and bound variables in a transition formula are 
defined as usual, as is the notion of substitution of a transition function a for a 
variable u, written E[a/v]. We assume that capture of free variables in a substitution 
is avoided by an implicit renaming of bound variables. For example, the set of free 
variables of the transition function e shown above is {x, v, v'}, and v' is a bound 
variable of the action C. We emphasize that at the level of transition formulas, we 
consider v and v' to be distinct, unrelated variables. 

State formulas arc transition formulas that do not contain free primed flexible 
variables. For example, the action C above is actually a state predicate. Because 
the semantics of state formulas only depends on a single state, we simply write 
H-Pjf when P is a state formula. The subclass of constant formulas is even more 
restrictive in that only free occurrences of rigid variables are allowed; consequently, 
their semantics depends only on the valuation £. (Arguably, rigid formulas would 
be a more appropriate name for this class, but the TLA literature consistently uses 
the designation constant formulas.) 

TLA introduces some specific abbreviations at the level of transition formulas. If 
E is a state formula then E' is the transition formula obtained from E by replacing 
each free occurrence of a flexible variable v in E with its primed counterpart v' 
(where bound variables are renamed as necessary). For example, since C is a state 
formula, we may build the formula C by substituting v* for v. Since v f is bound in 
C, this results in the formula 3 y : p(f(v', x, y)) A = x), up to renaming of the 
bound variable. 

For an action A, the state formula Enabled A is obtained by existential quan- 
tification over all primed flexible variables that occur free in A. Thus, [Enabled A}1 
holds if [i4]J t holds for some state t, that is, if action A may occur in state s. For 
actions A and B t the action A • B is defined as 3 z : A[z/v'\ A B[z/v] where v is a 
list of fill flexible variables v> such that v { occurs free in B or v\ occurs free in A, 
and z is a corresponding list of fresh variables. It follows that \A • B\\ t holds iff 
both [AH tU and [i?]^ hold for some state u. 

Because these abbreviations tire defined in terms of quantification and substitu- 
tion, their interplay can be quite delicate. For example, Enabled P is by definition 
just P for any state predicate P, and therefore (Enabled P)' equals P f . On the 
other hand, ENABLED (P') is a constant formula — if P does not contain any rigid 
variables then Enabled (P f ) is valid iff P is satisfiable. 

For an action A and a state function t we write [A] t to denote A V V = i, and 
(A) t for A A -»(i' = t). Therefore, [A] t requires A to hold only if t changes value 
during a transition, whereas the dual formula {A) t strengthens A in requiring that 
t changes value while A holds true. 



On the Logic of TLA + 9 
2.3 Temporal formulas 

We now turn to the temporal tier of TLA. Because it is less familiar than first-order 
predicate logic and because we wish to give precise definitions, we devote much more 
space to its presentation. However, the temporal formulas that one actually writes 
in TLA + specifications usually follow a standard idiom, and more than 95% of a 
typical specification consist of definitions at the transition level. 

The (temporal) formulas of TLA are inductively defined as follows: 

• Every state formula is a formula. 

• Boolean combinations (connectives including A, V, and =) of formulas 
are formulas. 

• If F is a formula then so is OF ("always F n ). 

• If A is an action and t is a state function then D[A] t ("always square A sub t n ) 
is a formula. 

• If F is a formula and x is a rigid variable then 3 a: : F is a formula. 

• If F is a formula and v is a flexible variable then 3v : F is a formula. 

In particular, an action A by itself is not a temporal formula, not even in the 
form [A] t . Actions can occur only in subformulas D[A] t . 

To determine free and bound variables at the temporal level, we do not dis- 
tinguish between primed and unp rimed occurrences of flexible variables, and the 
quantifier 3 binds both kinds of occurrences. More formally, the set of free vari- 
ables of a temporal formula is a subset of Vp U Vr. The free occurrences of variables 
in a state formula P, considered as a temporal formula, are precisely the free occur- 
rences in P, considered as a transition formula. However, variable v € Vp has a free 
occurrence in D[A] t iff either v or v' has a free occurence in A or in t. Similarly, 
substitution F[e/v] of a state function e for a flexible variable v substitutes both 
e for v and e 1 for v* in the action subformulas of F, again after renaming bound 
variables as necessary. For example, substitution of the state function /i(u), where 
h e Cp and v £ Vp t for w in the temporal formula 

3v : p{v,w) An[q{vJ{w ) v , ) ) w% M 

results in the formula (up to renaming of the bound variable) 

3u : pK/iWjAD^K/IM^.H'l.Mv'))]^)) 

Because state formulas do not contain free occurrences of primed flexible variables, 
the definitions of substitutions for transition formulas and for temporal formulas 
agree on state formulas. The substitution of a (proper) transition function for a 
variable is not allowed as it could result in an expression that is not a well-formed 
TLA formula. 
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The semantics of temporal formulas is defined in terms of behaviors, which are 
infinite sequences of states, and of valuations of the rigid variables. For a behavior 
a = sqSi . . we write Oi to refer to state s\, and a\i to denote the suffix SiS i+ i — 
The following inductive definition assigns a truth value [F}* e {t, f} to every for- 
mula F: 

• l^H — [^Il£ 0 : state formulas are evaluated at the initial state of the behavior. 

• The semantics of Boolean operators is the usual one. 

• lDF|£ = t iff fFj^. = t for all i € N: this is the usual clause from linear-time 
temporal logic. 

• {U[A]tf 0 = t iff for all t 6 N, \tf 0 . = or [A]l„ w = t: such a formula 
holds iff every state transition in a that does not leave t unchanged satisfies A. 

• pre ; F}1 = t iff = t for some valuation 77 such that 7){y) = £(y) for all 
y 6 V# \ {#}: this is again the usual definition from predicate logic. 

• The semantics of formulas 3 v : F will be defined in section 2.4 below. 

Abbreviations for temporal formulas include the universal quantifiers V and 
V over rigid and flexible variables. The formula 0^ ("eventually F"), defined 
as -0->/ r , asserts that F holds of some suffix of the behavior; similarly, 0(A) t 
("eventually angle A sub t") is defined as -^C3[-^4] t and asserts that some future 
transition satisfies A and changes the value of t. We write F ^ G ("F leads 
to G") for the formula D(F => 0<7), which asserts that every occurrence of F will 
eventually be followed by an occurrence of G. Combinations of the "always" and 
"eventually" operators express 'Infinitely often" (DO) and "almost always" «>□). 
Observe that a formula can be both infinitely often true and infinitely often false, 
thus "almost always" is strictly stronger than "infinitely often" . These combinations 
are especially important as the building blocks to formulate fairness conditions. In 
particular, weak and strong fairness for an action {A) t are defined as 

WF t (A) = (□O-'Enabled (A) t ) V D0(A) t (= ODEnabled {A) t => D0(A) t ) 
S?t(A) = (OCHEnabled (A) t ) v CI0{j4)t (= DOEnabled (A) t D0(A) t ) 

Weak fairness stipulates that an action {A) t occurs infinitely often during a 
behavior if it is almost always enabled; strong fairness even requires that the action 
must happen infinitely often if it is infinitely often, but not necessarily persistently, 
enabled. 

2.4 Stuttering invariance and quantification 

Formulas D[A] t allow for "stuttering": besides state transitions that satisfy A } they 
also admit any transitions that do not change the state function t. In particular, 
duplications of states can not be observed by formulas of this form. Stuttering 
invariance is important in connection with refinement and composition (24). 
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To formalize this notion, for a set V of flexible variables we define two states s 
and t to be V-equivalent, written s = y t, iff s(v) — t(v) for all v e V. We define V- 
stuttering equivalence, written «s the smallest equivalence relation on behaviors 
that contains po{s)oa and p o (t, u) o cr, for any finite sequence of states p, infinite 
sequence of states <j, and V-equivalent states s =y t =v «■ Intuitively, V-stuttering 
equivalence allows for duplication and deletion of finite repetitions of V-equivalent 
states. In particular, the relation « Vj n which we also write as «, identifies two 
behaviors that differ by duplications or deletions of identical states. 

The fundamental theorem asserting that TLA is not expressive enough to dis- 
tinguish stutteriug-equivalent behaviors can now be formally stated as follows: 

Theorem 1 (stuttering invariance). Assume that F is a TLA formula whose free 
flexible variables are among V, that a «v r are V-st uttering equivalent behaviors, 
and that £ is a valuation. Then [FJ* = [FJ*. 

For TLA formulas without quantification over flexible variables, it is not hard 
to prove theorem 1 from the semantic clauses of section 2.3 by induction on the 
structure of formulas [25. 6], On the other hand, quantification over flexible variables 
requires some attention: the "obvious" semantic clause for formulas 3 v : F would 
read [3 v : FJ£ = t iff JF]£ = t for some behavior r whose states agree with 
the corresponding states Oi on all variables except for v. This definition, however, 
would not preserve stuttering invariance. For example, consider the formula 



F = v = cAw = cA 0(w 7^ c) A D[v ^ c] w v c d d • - 

w c c d • ' 

that requires that both variables v and w initially equal the constant c, that eventu- 
ally, w is different from c, and that v must be different from c whenever w changes 
value. Any behavior a that satisfies F must therefore contain two distinct transi- 
tions, the first of which changes v from c to some other value (but preserving the 
value of w), while the second transition changes w, as indicated in the picture. In 
particular, ai{w) must equal c, hence the above definition of quantification implies 
that Tx(w) must equal c, for any behavior r satisfying the formula 3v : F. How- 
ever, the behavior r obtained from o by removing the second state (where v equals d 
but w equals c) is {w}-stuttering equivalent to a. Because w is the only free flexible 
variable of 3 v : F, theorem 1 asserts that r shoidd satisfy 3 v : F, although Ti(w) 
is different, from c. 

In other words, the definition of quantification over flexible variables must allow 
for the removal of transitions that modify only the bound variables. This observation 
motivates the following semantic clause for quantified formulas: for a flexible variable 
v, we say that two behaviors a and r are equal up to v iff &i and t» agree on all 
variables in Vj? \ {v}, for all i € N. We say that a and r are similar up to v, written 
a ~„ r iff there exist behaviors a 1 and r' such that 

• a and a' are stuttering equivalent (a a'), 
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• a' and r' are equal up to v, and 

• r and r' are again stuttering equivalent (r « r'). 

Now, we define [3 v : F|£ = t iff f FJ£ = t holds for some behavior r similar to 
a up to v. Because the definition of ~ w explicitly allows for stuttering, theorem 1 
is preserved for all TLA formulas. 

2.5 Properties, refinement, and composition 

As we have seen in the example of the FIFO queue, TLA uses the same formal- 
ism of temporal logic to represent system specifications and properties. System 
specifications are usually written in the form 

3 a; : Init A D[Ncxt] v A L 

where v is the list of all relevant state variables, x is the list of internal (hidden) 
variables, Init is a state predicate representing the initial condition, Next is an ac- 
tion that describes the next-state relation, usually written as a disjunction of more 
elementary actions, and L is a conjunction of formulas WF v (j4) or SF^) asserting 
fairness assumptions of individual disjuncts of Next. However, other forms of spec- 
ifications are possible and can occasionally be useful. Asserting that a property F 
holds of a specification S amounts to saying that every behavior that satisfies S must 
also satisfy F\ in other words, it asserts the validity of the implication S F. For 
example, the theorem asserted in module SyncQueuelnternal states three essential 
properties of the FIFO queue. 

Unlike most other temporal logics, TLA is intended to support stepwise system 
development by refinement of specifications [11]. The basic idea of refinement con- 
sists in successively adding implementation detail while preserving the properties 
required at an abstract level. In a refinement-based approach to system develop- 
ment, one proceeds by writing successive models, each of which introduces some 
additional detail while preserving the essential properties of the preceding model. 
Fundamental properties of a system can thus be established at high levels of abstrac- 
tion, errors can be detected in early phases, and the complexity of formal assurance 
is spread over the entire development process. A refinement C preserves all TLA 
properties of an abstract specification A if and only if for every formula F, if A F 
is valid, then so is C =^ F. This condition is in turn equivalent to requiring the va- 
lidity of C i4. Because C will contain extra variables to represent the lower- level 
detail, and because these variables will change in transitions that have no counter- 
part at the abstract level, stuttering invariance of TLA formulas is essential to make 
validity of implication a reasonable definition of refinement. 

Stuttering invariance is also essential for composition to be representable as 
conjunction [18]. In fact, if A and B are TLA specifications of two components, then 
A A B describes those behaviors that satisfy both components' initial conditions, 
that allow actions of either process to occur, synchronizing on common variables 
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(which represent interfaces between the components), and that satisfy all relevant 
liveness properties. In particular, stuttering invariance ensures that each component 
may perform local actions without interfering with the specification of the other 
component. 

As a test of these ideas, we might try to prove that two FIFO queues in a row 
again implement a FIFO queue. Let us assume that the two queues are connected 
by a channel mid, then the above principles seem to imply that the formula 2 

Fifo[mid/out) A FiJo\mid/in) => Fifo 

is valid. Unfortunately, this is not true, for the following reason: formula Fifo implies 
that the in and out channels never change simultaneously, whereas the conjunction 
on the left-hand side allows such changes (if the left-hand queue performs an Enq 
action, while the right-hand queue performs a Deq). This technical problem can 
be attributed to a design decision taken in the specification of the FIFO queue to 
disallow simultaneous changes to its input and output interfaces, a specification style 
known as "interleaving specifications". In fact, the argument merely shows that the 
composition of two interleaving queues does not implement a interleaving queue. 
Choosing an interleaving or a non-interleaving style is an artifact of the model that 
represents the actual system; interleaving specifications are usually easier to write 
and to understand. The problem disappears if we explicitly add an 'Interleaving" 
assumption for the composition: the implication 

Fifo[mid/out] A Fifo[mid/in] AD[m' = in V out* = out] in%out Fifo (1) 

is valid and its proof will be considered in section 3.5. 

2.6 Variations and extensions 

We discuss some of the choices that we have made in the presentation of TLA, as 
well as possible extensions. 

Transition formulas and priming. Our presentation of TLA is based on stan- 
dard first-order logic, to the extent possible. In particular, we have denned transition 
formulas as formulas of ordinary predicate logic over a large set Ve of variables where 
v and v 1 are unrelated. An alternative presentation would consider ' as an operator, 
resembling the next-time modality of temporal logic. In fact, this appears to be the 
presentation preferred by Lamport [27]. The semantics of temporal fonnulas is unaf- 
fected by the choice of presentation, and the style adopted in this paper corresponds 
well to the verification rules of TLA, explored in section 3. 



2 TLA + introduces concrete syntax, based on module instantiation, for writing substi- 
tutions such as Fifo[mid/out]. 
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Compositional verification. We have argued in section 2.5 that composition is 
represented in TLA as conjunction. Because components can rarely be expected 
to operate correctly in arbitrary environments, their specifications usually include 
some assumptions about the environment. An open system specification is one that 
does not constrain its environment; it asserts that the component will function cor- 
rectly provided that the environment behaves as expected. One way to write such 
specifications is in the form of implications E M where E describes the environ- 
ment assumptions and Af , the component specification. However, it turns out that 
often a stronger form of specifications is desirable that requires the component to 
adhere to its description M for at least as long as the environment has not broken its 
obligation E. In particular, when systems are built from "open" component speci- 
fications, this form, written E M, allows for a strong composition rule that can 
discharge mutual assumptions between components [4, 14]. It can be shown that 
the operator ^t> is actually definable in TLA, and that the resulting composition 
rule can be justified in terms of an abstract logic of specifications, supplemented by 
principles specific to TLA [5. 7]. 

TLA*. TLA defines distinct tiers of transition formulas and temporal formulas, 
where transition formulas must be guarded by "brackets" to ensure stuttering invari- 
ance. Although the separation between the two tiers is natural when writing system 
specifications, it is not a prerequisite to obtaining stuttering invariance. In [32], 
I have defined the logic TLA* whose syntax distinguishes the two classes of pure 
and impure formulas. Whereas pure formulas of TLA* contain impure formulas in 
the same way that temporal formulas of TLA contain transition formulas, impure 
formulas generalize transition formulas in that they admit Boolean combinations of 
F and oG, where F and G are pure formulas and o is the next-time modality of 
temporal logic. For example, the TLA* formula 

D[A=>oO(B) u ] t 

requires that every (A) t action must eventually be followed by {B) u . Assuming 
appropriate syntactic conventions, TLA* is a generalization of TLA because every 
TLA formula is also a TLA* formula, with the same semantics. On the other hand, 
it can be shown that every TLA* formula can be expressed in TLA using some 
additional quantifiers. For example, the TLA* formula above is equivalent to the 
TLA formula 3 

3v : A □((« = c) s 0(B) U ) 
A 0[v4 v' = c] t 

where c is a constant and v is a fresh flexible variable. TLA* thus offers a richer 
syntax without increasing the expressiveness, allowing high-level requirement spec- 



3 Strictly, this equivalence is true only for universes that contain at least two distinct 
values; one-element universes are not very interesting. 
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ifications to be expressed more directly. On the other hand, the propositional frag- 
ment of TLA* admits a rather straightforward axiomatization. (No proof system 
is known for propositional TLA, although Abadi [1] proposed a rather involved ax- 
iomatization of an early version of TLA that was not invariant under stuttering.) 
For example, 

Q[F => oF] v => (F DF) 

where F is a temporal formula and v is a tuple containing all flexible variables 
with free occurrences in F, is a TLA formulation of the usual induction axiom of 
temporal logic; this is a TLA formula only if F is in fact a state formula. 



Binary temporal operators. Unlike standard linear-time temporal logics [30], 
TLA does not include binary operators such as until, because they are not necessary 
for writing system specifications, and because they can be confusing, especially when 
nested. These operators arc, however, definable in TLA using quantification over 
flexible variables. For example, suppose that P and Q are state predicates whose 
free variables are among w, that v is a flexible variable that does not appear in w, 
and that c is a constant. Then P until Q can be defined as the formula 

3v : A {v = c) s Q 

A □[(« ± c => P) A (v' = c = (v = c V <2'))W> 
A 0Q 

The idea is to use the auxiliary variable v to remember whether Q has already 
been true. As long as Q has been false, P is required to hold. For arbitrary TLA 
formulas F and G, the formula F until G can be defined along the same lines, 
using a technique as shown for the translation of TLA* formulas above. 



3 TLA PROOF RULES 



Since TLA formulas are used to describe systems as well as their properties, deduc- 
tive system verification can be based on logical axioms and rules of TLA. More 
precisely, a system described by formula Spec has property Prop if and only if 
every behavior that satisfies Spec also satisfies Prop, that is, iff the implication 
Spec Prop is valid over the class of interpretations where the function and predi- 
cate symbols have the intended meaning. System verification, in principle, therefore 
requires reasoning about sets of behaviors. The TLA proof rules are designed to 
reduce this temporal reasoning, as far as possible, to the proof of verification con- 
ditions expressed in the underlying predicate logic, a strategy that is commonly 
referred to as assertionai reasoning. In this section, we state some typical rules and 
illustrate their use; more information can be found elsewhere [25]. 
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3.1 Invariants 

Invariants characterize the set of states that can be reached during system execution; 
they constitute the basic safety properties of interest and are also the starting point 
for almost any verification attempt. In TLA, an invariant is expressed by a formula 
of the form □/ where J is a state formula. 

A basic rule for proving invariants is given by 

lA[N] t =*/' 

1 1 (INV1) 



/AD[JV] t =>□/ 

This rule asserts that for every interpretation for which the antecedent / A [N] t /' 
is a valid transition formula, the consequent / A □/ is a valid temporal 

formula. The antecedent states that every possible transition (stuttering or not) 
preserves /; thus, if / holds initially it is guaranteed to hold forever. Formally, the 
correctness of rule (IN VI) is easily established by induction on behaviors. Because 
the antecedent is a transition formula, its proof relies on standard axioms and proof 
rules of predicate logic, augmented by "data" axioms that characterize the intended 
interpretations. 

For example, we can use (INV1) to prove the invariant D(q € Seq(Message)) of 
the FIFO queue specified in module SyncQueuelnternal of figure 1(b). We have to 
prove 

FifoJ => D(q 6 Seq(Message)) (2) 

which, by rule (INVl), the definition of formula Fifol, and propositional logic can 
be reduced to proving 

Init q € Seq(Message) (3) 

q € Seq(Message) A [Next] vars q r € Seq(Message) (4) 

Because the empty sequence is certainly a finite sequence of messages, (3) follows 
from the definition of Init and appropriate data axioms. Similarly, the proof of (4) 
reduces to proving preservation of the invariant under stuttering, Deq 1 and Enq(m) 
actions, for any rn € Message, all of which are again straightforward. 



3.2 Step simulation 

The following rule can be used to prove "action invariants" ; it relies on a previously 
proven state invariant /: 

/A/ ' A ^ [jVl » (TLA2) 
□/ A Q{M\ t => D[N] U 
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In particular, it follows from (TLA2) that the next-state relation can be strengthened 
by an invariant: 

□/ A D[M] t =» D[M A / A /'], 

Note that the converse of this implication is not valid: the right-hand side holds of 
any behavior where t never changes, independently of the value of /. 

We may use (TLA2) to prove that the FIFO queue never dequeues the same 
value twice in a row: 

Fifol D[Deq out' ^ out] vars - (5) 

This proof requires an invariant that in particular asserts that no consecutive ele- 
ments of the internal queue are identical: 

Inv = LET oq = {out) o q 

IN A in = oq[Len(oq)] 

A Vi € l..Len(oq) - 1 : oq[i\ ^ oq[i + 1] 

We have used some TLA* 1 * syntax in formulating Inv: the local abbreviation oq 
denotes the sequence obtained by prefixing the current value of the output channel 
out to the internal queue q\ also, TLA + represents a sequence 5 as a function such 
that its elements can be accessed as s [l], ... , s[Len(s)]. Formula Inv asserts that the 
current value of the input channel equals the last element of the sequence oq, and 
that no two consecutive elements of oq are identical. The proof of Fifol Dlnv 
follows the pattern used in proving invariant (2) above, using rule (INVl). 
For the proof of (5), rule (TLA2) requires that we show the validity of 

Inv A Inv* A [Next] var9 [Deq out 1 ^ ou£] mrs (6) 

The proof of (6) reduces to the three cases of a stuttering transition, an Enq(m) 
action, and a Deq action. Only the last case is non- trivial. Its proof relies on the 
definition of Deq, which implies that q is non-empty and that out* = Head(q). In 
particular, the sequence oq contains at least two elements, and therefore Inv implies 
that oq[l} : which is just out, is different from og[2], which is Head(q). This suffices 
to prove out 1 ^ out. 

3,3 Liveness properties 

Liveness properties, intuitively, assert that something good must eventually hap- 
pen [10, 23]. Because formulas D[N] t are satisfied by stuttering behavior and do not 
require any progress, the proof of liveness properties must ultimately rely on fairness 
properties assumed of the specification. TLA provides rules to deduce elementary 
liveness properties from the fairness properties assumed of a specification; more 
complex properties can then be inferred with the help of well-founded orderings. 



ft • I 



18 S. Merz 

The following rule can be used to prove a leads-to formula from a weak fairness 
assumption; a similar rule exists for strong fairness. 



/ A /' A P A [N] t =>P'VQ' 
/ A /' A P A (JV A A) t Q' 
I A P =» Enabled (A) t 

□/ A D[N] t A WF t (A) =>(P~>Q) 



(WF1) 



In this rule, P and Q are state predicates, / is again an invariant, [N\ t represents 
the next-state relation, and (A) t is a "helpful action" [29] for which weak fairness 
is assumed. Again, all three premises of ( WF1) are transition formulas. To see why 
the rule is correct, assume that a is a behavior satisfying □/ AD[Af] t A WF t (A), and 
that P holds of state d. We have to show that Q holds of some with j > i. By 
the first premise, any successor of a state satisfying P has to satisfy P or (J, so P 
must hold for as long as Q has not been true. The third premise ensures that in all of 
these states, action (A) t is enabled, and so the assumption of weak fairness ensures 
that eventually {A) t occurs, unless Q has become true before, in which case we are 
done. Finally, by the second premise, any (.4) (-successor (which, by assumption, is 
in fact an (N A A) (-successor) of a state satisfying P must satisfy Q, which proves 
the claim. 

For our running example, we can use rule (WF1) to prove that every message 
stored in the queue will eventually move closer to the head of the queue or even to 
the output channel. Formally, let the state predicate at(k,x) be defined by 

at(k } x) = k € l..Len(q) A q[k] = x 

We will use (WF1) to prove 

Fifol =» (a«(fc, x) ~* (out = x V at{k - 1, x))) (7) 

where k and x are rigid variables. The following proof outline illustrates the appli- 
cation of rule (WF1), the lower- level steps relying on data axioms are omitted. 

1. at(k, x) A [Next]^ => at(k, x)' V out 1 = x V at(k - 1, x)' 

1.1. at(k y x) A m 6 Message A Enq(m) at(k,x)' 

1.2. at(k, x) A Deq A k = 1 => out' - x 

1.3. at(k, x) A Deq A k > 1 => at{k - 1, x)* 

1.4. ai(k,x) A vars' = vars at(k,x)' 

1.5. Q.E.D. 

From steps 1.1-1.4 by the definitions of Next and at(k^x). 

2. at(k, x) A {Deq A Nexi) vaTS out' = x V at{k - 1, x)' 
Follows from steps 1.2 and 1.3 above. 

3. at(k, x) => Enabled {Deq)^ 

For any k, at(k, x) implies that q ^ {) and thus the enabledness condition. 
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However, rule (WF1) cannot be used to prove the stronger property that every 
input to the queue will eventually be dequeued, 

Fifol V m € Message : in = m out = m (8) 

because there is no single "helpful action": the number of Deq actions necessary 
to produce the input element on the output channel depends on the length of the 
queue. Intuitively, the argument used to establish property (7) must be iterated. 
The following rule formalizes this idea as an induction over a well-founded relation 
(D, >-): a binary relation such that there docs not exist an infinite descending chain 
d\ >- d 2 >- . . . of elements di € D. 

(D,y) is well— founded 
F^VdeD : (G*(Hv3eeD:d^eAG[e/d})) n^TTICE) 
F=*\/deD : (G — > H) 

In this rule, d and e are rigid variables such that d does not occur in H and e does 
not occur in G. For convenience, we have stated rule (LATTICE) in a language of set 
theory where, in particular, Vz 6 S : F abbreviates the formula Va: : x £ S F. 

Unlike the premises of the rules considered so far, the second hypothesis of rule 
(LATTICE) is itself a temporal formula that requires that every occurrence of G , 
for any value d € D, be followed either by an occurrence of H, or again by some 
G, for some smaller value e. Because the first hypothesis ensures that there cannot 
be an infinite descending chain of values in D : eventually H must become true. 
In principle, the hypothesis of well-foundedness can itself be expressed in TLA by 
asserting the validity of the formula 

AVd€/> : ^(dyd) 

A Vu : □(« € D) A D\v >- v% => 0D[FALSE] v 

whose first conjunct expresses the irreflexivity of >- and whose second conjunct 
asserts that any sequence of values in D that can only change by decreasing with 
respect to >- must eventually become stationary. In fact, if this formula is valid over 
a given interpretation then >- is interpreted by a well-founded relation. In system 
verification, wcll-foundcdncss is however usually considered as a u data axiom" . 

Choosing (Nat, >), the set of natural numbers with the standard "greater-than" 
relation as the well-founded domain, the proof of property (8) follows from property 
(7) and the invariant Inv defined in section 3.2 using rule (LATTICE). 

Lamport [25] lists further (derived) rules for liveness properties, including intro- 
duction rides for proving formulas WF t (yt) and SF t (j4). 

3.4 Simple temporal logic 



The proof rules considered so far support the derivation of typical correctness prop- 
erties of systems. In addition, TLA satisfies standard axioms and rules of linear- time 
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(STL1) 


F 


(STL4) 


□(F=*G)=>(DF =»□<?) 


OF 


(STL2) 


DF^F 


(STL5) 


D(F A G) = (DF A DG) 


(STL3) 


DDF == DF 


(STL6) 


OD(FaG) = (ODFaODG) 



Fig. 2. Simple temporal logic. 



temporal logic that are useful when preparing the application of verification rules. 
Figure 2 contains the axioms and rules of "simple temporal logic", adapted from 
Lamport [25]. It can be shown that this is just a non-standard presentation of the 
modal logic S4.2 [19], implying that these laws by themselves characterize a modal 
accessibility relation for □ that is reflexive, transitive, and locally convex (or con- 
fluent). The last condition asserts that for any state 5 and states t, u that are both 
accessible from s there is a state v that is accessible from t and u. 

3.5 Quantifier rules 

Although we have seen in section 2.4 that the semantics of quantification over flexible 
variables is non-standard, the elementary proof rules for quantifiers are those familiar 
from first-order logic: 

(3E) 
(3E) 

In these rules, a; is a rigid and v is a flexible variable. The elimination rules (3 E) 
and (3 E) require the usual proviso that the bound variable should not be free in 
formula G. In the introduction rules, t is a state function, while c is a constant 
function. Observe that if we allowed an arbitrary state function in rule (31), we 
could prove 

3x : D(x = v) (9) 

for any state variable v from the premise D(v — v), provable by (STL1). How- 
ever, formula (9) asserts that v remains constant throughout a behavior, which can 
obviously not be valid. 

Since existential quantification over flexible variables corresponds to hiding of 
state components, the rules (31) and (3E) play a fundamental role in proofs of 
refinement for reactive systems. In this context, the "witness" t is often called 
a refinement mapping [2]. For example, the concatenation of the two low-level 



F[c/x]=>3x : F (31) 
F[t/v]=>3v : F (31) 



F G 
(3 at : F) => G 

F=>G 
(3« : F) => G 
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queues provides a suitable refinement mapping to prove the validity of formula (1), 
which claimed that two FIFO queues in a row implement a FIFO queue, assuming 
interleaving of changes to the input and output channels. 

Although the quantifier rules are standard, one should recall from section 2.2 
that care has to be taken when substitutions are applied to formulas that are defined 
in terms of quantification. In particular, the formulas WF t (j4) and SF<(>1) contain 
the subformula ENABLED (A) t) and therefore, e.g., WF t (A)[e/v] need not be equiv- 
alent to the formula WF^ c/y ](i4[e/t;, e'/v']). For a more thorough discussion of this 
possible pitfall in system verification, see Lamport ; s original TLA paper [25]. 

Unfortunately, refinement mappings need not always exist. For example, (31) 
cannot be used to prove the valid TLA formula (excluding one-element universes) 

3v : CKXtrue),, (10) 

that asserts the existence of a flexible variable whose value changes infinitely often. 
(Such a variable could be used as an "oscillator", triggering system transitions.) In 
fact, an attempt to prove (10) by rule (31) would require to exhibit a state function 
t whose value is certain to change infinitely often in any behavior. However, it is 
easy to show by induction on the syntax of state functions that for any t there exists 
a behavior such that the value of t remains constant forever. 

An approach to solving this problem, advocated in [2], consists in adding aux- 
iliary variables such as history and prophecy variables. Formally, this approach 
consists in adding special introduction rules for auxiliary variables. The proof of 
G => 3 v : F is then reduced to first proving a formula of the form G 3 a : G a ux 
using a rule for auxiliary variables, and then use the rules (3E) and (31) above to 
prove G A G aux 3u : F. 

4 FORMALIZED MATHEMATICS: THE ADDED VALUE OF TLA+ 

The definitions of the syntax and semantics of TLA in section 2 were generic in 
terms of an underlying language of predicate logic and its interpretation. TLA + in- 
stantiates this generic definition of TLA with a specific first-order language, namely 
Zermelo-Frankel set theory with choice. This adoption of a standard interpretation 
enables precise and unambiguous specifications of the "data structures" on which 
specifications are based; we have seen in the example proofs in section 3 that rea- 
soning about the data accounts for most of the steps that need to be proved during 
system verification. TLA + also provides facilities for structuring a specification as a 
hierarchy of modules, for declaring parameters, and most importantly, for defining 
operators. These facilities are essential for writing actual specifications and must 
therefore be mastered by any user of TLA + . However, from the foundational point 
of view adopted in this paper, they are just syntactic sugar. We will therefore con- 
centrate on the set-theoretic foundations, referring the reader to Lamport's book [27] 
for a detailed presentation of the language of TLA + . 
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4.1 Elementary data structures: basic set theory 

The signature of the predicate logic underlying TLA+ contains a single binary pred- 
icate symbol 6 and no function symbols. 4 Terms and formulas at the transition 
level are denned as indicated in section 2.2, with an extra term formation rule that 
defines CHOOSE x : A to be a transition function whenever x € Ve is a variable 
and A is an action. 5 The occurrences of x in the term CHOOSE x : A are bound. 
To this first-order language corresponds a set-theoretic, interpretation: every TLA 4 " 
value is a set. Moreover, € is interpreted as set membership and the interpretation 
is equipped with an (unspecified) choice function e mapping every non-empty col- 
lection C of values to some element e(C) of C, and mapping the empty collection 
to an arbitrary value. The interpretation of a term CHOOSE x : P is defined as 



This definition employs the choice function to return some value satisfying P pro- 
vided there is some such value in the universe of set theory. We should remark that 
in this semantic clause, the choice function is applied to a collection that need not 
be a set (i.e., an element of the universe); in set-theoretic terminology, e applies to 
classes and not just to sets. Because e is a function, it produces the same value 
when applied to equal arguments. It follows that choice satisfies the laws 



TLA + avoids undefinedness by underspecification [17], so CHOOSE x : P de- 
notes a value even if no value satisfies P. To ensure that a term involving choice 
actually denotes the expected value, the existence of some set satisfying the charac- 
teristic predicate should be proven. If there is more than one such value, the expres- 
sion is undcrspccificd, and the user should be prepared to accept any of them. In 
particular, any properties will have to be established for all possible values. However, 
observe that for a given interpretation, choice is deterministic and not "monotone" : 
no relationship can be established between CHOOSE x : P and CHOOSE x : Q even 
when P ±> Q is valid (unless P and Q are actually equivalent). Therefore, when- 
ever some specification Spec contains an underspecified application of choice, any 
refinement Ref is bound to make the same choices in order to prove Ref =^ Spec; 
this situation is quite different from non-dctcrminisrn where implementations may 
narrow the set of allowed values. 

In the following, we will freely use many abbreviations defined by TLA + . For 
example, 3x,y £ S : P abbreviates 3x : 3y : x€.SAyeSAP, and similar 

4 Once again, our presentation deviates somewhat from Lamport [27] who prefers to 
treat all subsequent constructs on an equal footing rather than distinguishing between 
basic and derived operators. 

5 Temporal formulas arc defined as indicated in section 2.3; in particular. CHOOSE is 
never applied to a temporal formula. 



|CHOOSE x : Pf 9 t = 



W I I1a„*] = t}) 



(3 a; : P) = P[(CHOOSE x : P)/x] 
(V:c : (P = Q)) => (CHOOSE x : P) = (CHOOSE x : Q) 



(11) 
(12) 
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notation applies to V and CHOOSE. Local declarations are written as LET _ IN _, 
and IF _ THEN _ ELSE _ is used for conditional expressions. 



union 


union S = CHOOSE M : V x : (x e M = 3 T € S : x 6 T) 


binary union 


SUT = UNION {£, T] 


subset 


SCT ~ Vi : (xeS=>xeT) 


powerset 


SUBSET S = CHOOSE M : Vx : (x € M = x C S) 


comprehension 1 


{x € S : P} = CHOOSE M : Vx ; (x 6 Jtf = x € 5 A P)} 


comprehension 2 


{* : x 6 5} = CHOOSE M : Vfl : (y € M ~3x e S : y~t) 


Table 1. Basic set-theoretic operators. 



FVora membership and choice, one can build up the conventional language of 
mathematics [28], and this is the foundation for the expressiveness of TLA + . Table 1 
lists some of the basic set-theoretic constructs of TLA + ; we write 

{e l} ...,e n } = CHOOSE S : V x : (x € S = i = e^.^Va^ e„) 

to denote set enumeration and assume the additional bound variables in the defining 
expressions of table 1 to be chosen such that no variable clashes occur. The two 
comprehension schemes act as binders for variable x, which must not have free 
occurrences in S. The existence of the sets defined in terms of choice can be justified 
from the axioms of Zermelo-Frankel set theory [37], which provide the deductive 
counterpart to the semantics underlying TLA + . However, it is well-known that 
without proper care, set theory is prone to paradoxes. For example, the expression 

CHOOSE S : Vx : (x € S = x $ x) 

is a well-formed constant formula of TLA + , but the existence of a set 5 containing 
precisely those sets that do not contain themselves woidd lead to the contradiction 
that S € S iff S £ S; this is of course Russell's paradox. Intuitively, S is "too big" 
to be a set. More precisely, the universe of set theory does not contain collections 
that are in bijection with the collection of all sets. Therefore, when evaluating the 
above TLA* expression, the choice function is applied to the empty collection, and 
the result depends on the underlying interpretation. Perhaps unexpectedly, we can 
however infer from (12) that 

(CHOOSE S : Vx : (x € S == x £ x)) = (CHOOSE x : x € {}) 

Similarly, a generalized intersection operator dual to the union operator of ta- 
ble I cannot be sensibly defined, because the intersection of the empty set would 
have to produce the set of all sets, which we know cannot exist. 

On the positive side, we have exploited the fact that no set can contain all values 
in the definition 

NoMsg = CHOOSE x : x ^ Message 
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that appears in figure I (b) because No Msg is guaranteed to denote some value that 
is not contained in Message. If a later refinement wanted to fix a specific "null" 
message value null £ Message, it could do so by restricting the class of admissible 
interpretations via an assumption of the form 

ASSUME (CHOOSE x : x ^ Message) = null 

Because all properties established of the original specification hold for all possible 
choices of NoMsg, they will continue to hold for this restricted choice. 

4.2 More data structures 

Some sets can conveniently be interpreted as functions. A traditional approach, 
followed in Z and B [8, 36], is to construct functions via products and relations. 
TLA+ does not prescribe any concrete construction of functions. The set of functions 
whose domain equals S and whose codomain is a subset of T is denoted by [S — ► T], 
the domain of a function / is denoted by domain /, and the application of function 
/ to an expression e is written as f[e]. The expression [x € S »-+ e] denotes the 
function with domain S that maps any x € S to e; again, the variable x must not 
occur in 5 and is bound by the function constructor. Thus, any function / obeys 
the law 

/ = [x € DOMAIN / f[x]] (13) 

and this equation can in fact serve as a characteristic predicate for functional values. 
TLA + introduces a notation for overriding a function at a certain argument position 
(a similar concept underlies Gurevich's ASM notation [12]). Formally, 

(/ EXCEPT \[t] = u] = [x e DOMAIN / IF X = t THEN U ELSE f[x]] 

where a; is a fresh variable. 

Combining choice, sets, and function notation, one obtains an expressive lan- 
guage for defining mathematical structures. For example, the standard TLA + mod- 
ule introducing natural numbers defines them as an arbitrary set with constant zero 
and successor function satisfying the usual Peano axioms [27, p. 345], and Lamport 
goes on to similarly define the integers and the real numbers, ensuring that the basic 
arithmetic operations agree rather than having to overload the operation symbols. 

Recursive definitions can be introduced in terms of choice, e.g. 

factorial = CHOOSE / : / = [n € Nat IF n = 0 THEN 1 ELSE n * f[n - 1]] 
which TLA + , using some syntactic sugar, allows to write even more concisely as 
factorial[n € Nat] = if n=0 then 1 ELSE n * factorial[n - 1] 
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Of course, as witli any construction based on choice, such a definition should be 
justified by proving the existence of a function that satisfies the recursive equation. 
Unlike standard semantics of programming languages, TLA+ does not commit to 
the least fixed point of a recursively defined function in cases where there are several 
solutions. 

Tuples are represented in TLA + as functions: 

{*!,..., t n ) = [i e l..n ■— ► IF i = 1 THEN t x . . . ELSE t n ] 

where l..n denotes the set {j € Nat : 1 < j /\j < n} (and i is a "fresh" variable), 
and selection of the z-th element is just function application. Strings are defined 
as tuples of characters, and records are represented as functions whose domains are 
finite sets of strings. The update operation on functions can thus be applied to 
tuples and records as well. For record selection and update, the concrete syntax 
allows to write, for example, acct. balance instead of acci["balance"]. 



Seq(S) 


""A" 


UNION {[l..n] — > S : n 6 Nat} 


Len(s) 




CHOOSE n e Nat : DOMAIN s = l..n 


Head(s) 
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Tail(s) 
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_A 


[z € l..Len(s) + Len(i) »-* 






IF i < Len(s) THEN s[i] ELSE i[t - Len(s)]] 


Append(s, e) 




so(e) 



Fig. 3. Finite sequences. 



The standard TLA + module Sequences imported by the specification of the 
FIFO queue in figure 1(b) represents finite sequences as tuples. The definitions of 
the standard operations, some of whicli are shown in figure 3, is therefore quite 
simple. However, this simplicity can sometimes be deceptive. For example, these 
definitions do not reveal that the Head and Tail operations are "partial". They 
should be validated by proving the expected properties, such as 

V« € Seq(S) : Len(s) >!=>* = {Head(s)) o Tail(s) 
5 CONCLUSIONS 

The design of software systems requires a combination of ingenuity and careful 
engineering. While there is no substitute for intuition, the correctness of a proposed 
solution can be checked by precise reasoning over a suitable model, and this is the 
realm of logics and (formalized) mathematics. The role of a formalism is to help the 
user in the difficult and important activity of writing and analysing formal models. 
TLA + builds on the experience of classical mathematics and adds just a thin layer of 
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temporal logic in order to describe executions as sets of traces. A distinctive feature 
of TLA is its attention to refinement and composition, reflected in the concept of 
stuttering invariance. 

Whereas the expressiveness of TLA + undoubtedly helps in writing concise, high- 
level models of systems, one may wonder whether it lends itself as well to the analysis 
of these models. For example, we have pointed out several times the need to prove 
conditions of "well-definedness" related to the use of the choice operator. On the 
other hand, Zermelo-FYankel set theory with choice is probably the most widely 
used foundation of classical mathematics, and there are well-known idioms, such as 
primitive- recursive definitions, that ensure well-definedness. For the specification of 
reactive systems, TLA adds some proper idioms that control the delicate interplay 
between temporal operators, e.g. in order to ensure that a specification is machine 
closed [3]. 

Deductive verification of TLA + specifications can be supported by proof assis- 
tants, and in fact several encodings of TLA in the logical frameworks of different 
theorem provers have been proposed [15, 20, 31], although no prover has yet been de- 
signed to support full TLA+ . Perhaps more surprisingly, there has been much recent 
activity on developing a toolset based on the tlc model checker and simulator to 
aid in validating and debugging TLA + models [39], and this toolset has been applied 
in industrial development projects. Obviously, model checking is possible only for 
a sublanguage of TLA + . but interestingly, most real- world specifications are either 
written in this sublanguage or can be translated into it using minor transformations. 
The modeling language of TLC is still much more expressive that that of most other 
model checkers and therefore helps users to write concise system specifications. 
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Emacs 

From Wikipedia, the free encyclopedia 

This article is about the text editor. For the unrelated Apple Macintosh computer model see eMac. 



Emacs is a class of text editors, possessing an extensive set of 
features, that are popular with computer programmers and 
other technically proficient computer users. 

GNU Emacs, a part of the GNU project, is under active 
development and is the most popular version. The GNU 
Emacs manual describes it as "the extensible, customizable, 
self-documenting, real-time display editor." It is also the most 
portable and ported of the implementations of Emacs. As of 
2007, the latest stable release of GNU Emacs is version 21.4. 

The original EM ACS, a set of Editor MACroS for the TECO 
editor, was written in 1975 by Richard Stallman, initially put 
together with Guy Steele. It was inspired by the ideas of 
TECMAC and TMACS, a pair of TECOmacro editors written 
by Guy Steele, Dave Moon, Richard Greenblatt, Charles 
Frankston, and others. Many versions of Emacs have appeared 
over the years, but nowadays there are two that are commonly 
used: GNU Emacs, started by Richard Stallman in 1984 and 
still maintained by him, and XEmacs, a fork of GNU Emacs 
which was started in 1991 and has remained mostly 
compatible. Both use a powerful extension language, Emacs 
Lisp, that allows them to handle tasks ranging from writing 
and compiling computer programs to browsing the web. 

Some people make a distinction between the capitalized word 
"Emacs", used to refer to editors derived from versions created 
by Richard Stallman, and the lower-case word "emacs", which 
is used to refer to the large number of independent emacs 
reimplementations. The word "emacs" is often pluralized as 
emacsen by analogy with "oxen". For example, Debian's compatible Emacs package is named emacs en -common. The 
only plural given by the Collins English Dictionary is emacsenJ 1 ^ 

In Unix culture, Emacs is one of the two main contenders in the traditional editor wars, the other being vi. 
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License: 


GPL 


Website: 


www.gnu.org/software/emacs/ 
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History 

Emacs began life at the MIT Al Lab during the 1970s. Before its introduction, the default editor on the Incompatible 
Timesharing System (ITS), the operating system on the AI Lab's PDP-6 and PDP-10 computers, was a line editor 
known as TECO. Unlike modern text editors, TECO treated typing, editing, and document display as separate modes, 
as the later vi would. Typing characters into TECO did not place those characters directly into a document; one had to 
write a series of instructions in the TECO command language telling it to enter the required characters, during which 
time the edited text was not displayed on the screen. This behavior is similar to the program ed, which is still in use. 

Richard Stallman visited the Stanford AI Lab in 1972 or 1974 and saw the lab's "E" editor, written by Fred Wright. The 
editor had an intuitive WYSIWYG behavior as is used almost universally by modern text editors. Impressed by this 
feature, Stallman returned to MIT where Carl Mikkelsen, one of the hackers at the AI Lab, had added a display-editing 
mode called "Control-R" to TECO, allowing the screen display to be updated each time the user entered a keystroke. 
Stallman reimplemented this mode to run efficiently, then added a macro feature to the TECO display-editing mode, 
allowing the user to redefine any keystroke to run a TECO program. 

Another feature of E which was lacking in TECO was random-access editing. Since TECO's original implementation 
was designed for editing paper tape on the PDP-1, it was a page-sequential editor. Typical editing could only be 
performed on one page at a time, in the order that the pages appeared in the file. To provide random access in Emacs, 
Stallman elected not to adopt E's approach of structuring the file for page-random access on disk, but instead modified 
TECO to handle large buffers more efficiently, and then changed its file management philosophy to read, edit, and 
write the entire file as a single buffer. Almost all modern editors use this approach. 

The new version of TECO was instantly popular at the AI Lab, and soon there accumulated a large collection of 
custom macros, whose names often ended in "MAC" or "MACS", which stood for "macros". Two years later, Guy 
Steele took on the project of unifying the overly diverse keyboard command sets into a single set. After one night of 
joint hacking by Steele and Stallman, the latter finished the implementation, which included facilities for extending and 
documenting the new macro set. The resulting system was called EMACS, which stood for "Editing MACroS". An 
alternate version is that EMACS stood for "E with MACroS", a dig at E's lack of a macro capability. According to 
Stallman, he picked the name Emacs "because <E> was not in use as an abbreviation on ITS at the time." It has also 
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been pointed out that "Emack & Bolio's" was the name of a popular ice cream store in Boston, within walking distance 
of MIT. A text-formatting program used on ITS was later named BOLIO by Dave Moon, who frequented that store. 
However, Stallman did not like that ice cream, and did not even know of it when choosing the name "Emacs"; this 
ignorance is the basis of a Hacker koan, Emacs andBolio). 

Stallman realized the danger of too much customization and de-facto forking and set certain conditions for usage. He 
later wrote: 

"EMACS was distributed on a basis of communal sharing, which means all improvements must be given back to 
me to be incorporated and distributed " 

The original Emacs, like TECO, ran only on the PDP line. Its behavior was different enough from TECO to be 
considered a text editor in its own right. It quickly became the standard editing program on ITS. It was also ported 
from ITS to the Tenex and TOPS-20 operating systems by Michael McMahon, but not Unix, initially. Other 
contributors to early versions of Emacs include Kent Pitman, Earl Killian, and Eugene Ciccarelli. 

Other emacsen 

Many Emacs-like editors were written in the following years for other computer systems, including SINE (Sine is not 
EMACS), EINE ("EINE Is Not EMACS") and ZWEI ("ZWEI Was EINE Initially", for the Lisp machine), which were 
written by Michael McMahon and Daniel Weinreb ("erne" and "zwei" mean "one" and "two" in German, respectively). 
In 1978, Bernard Greenberg wrote Multics Emacs at Honeywell's Cambridge Information Systems Lab, which was the 
first version to fully embrace Lisp as its extension language. Emacs (including GNU Emacs) later adopted using Lisp 
as the editor's extension language. 

The first Emacs-like editor to run on Unix was Gosling Emacs, written in 1981 by James Gosling (who later invented 
NeWS and the Java programming language). It was written in C and, notably, used a language with Lisp-like syntax 
known as Mocklisp as an extension language. In 1984 it was proprietary software. 

GNU Emacs 

In 1984, Stallman began working on GNU Emacs to produce a free software alternative to Gosling Emacs; initially it 
was based on Gosling Emacs, but Stallman replaced the Mocklisp interpreter at its heart with a true Lisp interpreter, 
which entailed replacing nearly all of the code. It became the first program released by the nascent GNU project. GNU 
Emacs is written in C and provides Emacs Lisp (itself implemented in C) as an extension language. The first widely 
distributed version of GNU Emacs was 15.34, which appeared in 1985. (Versions 2 to 12 never existed. Earlier 
versions of GNU Emacs had been numbered "l.x.x", but sometime after version 1.12 the decision was made to drop 
the " I ", as it was thought the major number would never change. Version 13, the first public release, was made on 
March 20, 1985.) 

Like Gosling Emacs, GNU Emacs ran on Unix; however, GNU Emacs had more features, in particular a full-featured 
Lisp as extension language. As a result, it soon replaced Gosling Emacs as the de facto Emacs editor on Unix. 

Until 1999, GNU Emacs development was relatively closed, to the point where it was used as an example of the 
"Cathedral" development style in The Cathedral and the Bazaar. The project has since adopted a public development 
mailing list and anonymous CVS access. Development takes place in a single CVS trunk, which is at version 22.0.96. 
The current maintainer is Richard Stallman. 

XEmacs 
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Beginning in 1991, Lucid Emacs was developed by Jamie Zawinski and others at Lucid Inc., based on an early alpha 

version of GNU Emacs 19. The codebases soon diverged, and the separate development teams gave up^ trying to 
merge them back into a single program. This was one of the most famous early forks of a free software program. Lucid 
Emacs has since been renamed XEmacs; it and GNU Emacs remain the two most popular varieties in use today. 

Other implementations 

GNU Emacs was initially targeted at computers with a 32-bit flat address space, and at least 1 MiB of RAM, at a time 
where such computers were considered high end. This left an opening for smaller reimplementations. Some noteworthy 
ones are listed here: 

■ MicroEMACS, a very portable implementation originally written by Dave Conroy and further developed by 
Daniel Lawrence, which exists in many variations. The editor used by Linus Torvalds.PJ 

■ MG, originally called MicroGNUEmacs, an offshoot of MicroEMACS intended to more closely resemble GNU 
Emacs. Now installed by default on OpenBSD. 

■ JOVE (Jonathan's Own Version of Emacs), a non-programmable Emacs implementation for UNIX-like systems 
by Jonathan Payne. 

■ Freemacs, a DOS version with an extension language based on text macro expansion, all within the original 64 
KiB flat memory limit. 

■ Meadow M is an Emacs variant originating from Japan that is designed to operate under Windows. The focus of 
Meadow is to provide multi-lingual support. 

> MINCE (MINCE Is Not Complete Emacs), a version for CP/M from Mark of the Unicorn. MINCE evolved into 
Final Word, which eventually became the Sprint word processor from Borland. 

■ SXEmacs [1] is a fork of XEmacs 21.4.16 led by former XEmacs developer Steve Youngs. It aims to integrate 
Emacs into the X Windows systems, it includes MP3 player and productivity software. 

■ Zile 

Licensing 

For GNU Emacs (and GNU packages in general), it remains policy to accept significant code contributions only if the 
copyright holder executes a suitable disclaimer or assignment of their copyright interest, although one exception was 
made to this policy for the MULE (MULtilingual Extension, which handles Unicode and more advanced methods of 
dealing with other languages 1 scripts) code [2] since the copyright holder is the Japanese government and copyright 
assignment was not possible. This does not apply to extremely minor code contributions or bug fixes. There is no strict 
definition of minor, but as a guideline less than 10 lines of code is considered minor. This policy is intended to 
facilitate copy left enforcement, so that the FSF can defend the software in a court case if one arises. 

Features 

Emacs is a powerful and versatile text editor. It is primarily a text editor, not a word processor; it is geared toward 
manipulating pieces of text, rather than manipulating the font of the characters or printing documents (though Emacs 
can do these as well). Emacs provides commands to manipulate words and paragraphs (deleting them, moving them, 
moving through them, and so forth), syntax highlighting for making source code easier to read, and "keyboard macros" 
for performing arbitrary batches of editing commands defined by the user. 

Almost all of the functionality in the editor, ranging from basic editing operations such as the insertion of characters 
into a document to the configuration of the user interface, is controlled by a dialect of the Lisp programming language 
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known as Emacs Lisp. This unique and unusual design provides many of the features found in Emacs. In this Lisp 
environment, variables and even entire functions can be modified on the fly, without having to recompile or even 
restart the editor. As a result, the behavior of Emacs can be modified almost without limit, either directly by the user, 
or by loading bodies of Emacs Lisp code known variously as "libraries", "packages", or "extensions". 

Emacs contains a large number of Emacs Lisp libraries, and more "third-party" libraries can be found on the Internet. 
Many libraries implement computer programming aids, reflecting Emacs 1 popularity among programmers. Emacs can 
be used as an Integrated Development Environment ODE), allowing programmers to edit, compile, and debug their 
code within a single interface. Other libraries perform more unusual functions. A few examples are listed below: 

■ Calc, a powerful RPN numerical calculator 

■ Calendar-mode, for keeping appointment calendars and diaries 

■ Doctor, an implementation of ELIZA that performs basic Rogerian psychotherapy 

■ Dunnet, a text adventure 

■ EdifT, for working with diff files interactively. 

■ Emerge, for comparing files and combining them 

■ Emacs/W3, a web browser 

■ ERC, an IRC client 

■ Gnus, a full-featured newsreader and email client 

■ MULE, MultiLingual extensions to Emacs, allowing editing text written in multiple languages, somewhat 
analogous to Unicode 

■ Info, an online help-browser 

■ Emacs-wiki, LISP-based wiki software for Emacs 

■ Planner, a personal information manager for Emacs 

■ Tetris 
• Pong 

The downside to Emacs 1 Lisp-based design is a performance overhead resulting from loading and interpreting the Lisp 
code. On the systems in which Emacs was first implemented, Emacs was often noticeably slower than rival text 
editors. Several joke backronyms allude to this: Eight Megabytes And Constantly Swapping (from the days when eight 
megabytes was a lot of memory), Emacs Makes A Computer Slow, Eventually Mallocs All Computer Storage, and 
Eventually Makes All Computers Sick However, modern computers are fast enough that Emacs is seldom felt to be 
slow. In fact, Emacs starts up more quickly than most modern word processors. Other joke backronyms describe the 
user interface: Escape MetaAlt Control Shift. 

Platforms 

Emacs is one of the most ported non-trivial computer programs. It runs on a wide variety of operating systems, 
including most Unix-like svstems (GNU/Linux, the various BSDs, Solaris, AIX, IRIX, Mac OS X, [5 J [6] etc.), MS- 
DOS, Microsoft Windows'- 7 ^! 9 ] and Open VMS. Unix systems, both free and proprietary, frequently provide Emacs 
bundled with the operating system. 

Emacs runs on both text terminals and graphical user interface (GUI) environments. On Unix-like operating systems, 
Emacs uses the X Window System to produce its GUI, either directly or using a "widget toolkit" such as Motif, 
LessTif, or GTK+. Emacs can also use the native graphical systems of Mac OS X (using the Carbon interface) and 
Microsoft Windows. The graphical interface provides menubars, toolbars, scrollbars, and context menus. 

Editing modes 
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Emacs adapts its behavior to the type of text it is editing by entering editing modes called "major modes". Major modes 
are defined for ordinary text files, source code for many programming languages, HTML documents, TeX and LaTeX 
documents, and many other types of text. Each major mode tweaks certain Emacs Lisp variables to make Emacs 
behave more conveniently for the particular type of text In particular, they usually implement syntax highlighting, 
using different fonts or colors to display, keywords, comments, and so forth. Major modes also provide special editing 
commands; for example, major modes for programming languages usually define commands to jump to the beginning 
and the end of a function. 

The behavior of Emacs can be further customized using "minor modes". While only one major mode can be associated 
with a buffer at a time, multiple minor modes can be simultaneously active. For example, the major mode for the C 
programming language defines a different minor mode for each of the popular indent styles. 

Customization 

Emacs can be customized to suit individual needs. There are three primary ways to customize Emacs. The first is the 
customize extension, which allows the user to set common customization variables, such as the colour scheme, using a 
graphical interface. This is intended for Emacs beginners who do not want to work with Emacs Lisp code. 

The second is to collect keystrokes into macros and replay them to automate complex, repetitive tasks. This is often 
done on an ad-hoc basis and each macro discarded after use, although macros can be saved and invoked later. 

The third method for customizing Emacs is using Emacs Lisp. Usually, user-supplied Emacs Lisp code is stored in a 
file called . emacs, which is loaded when Emacs starts up. The . emacs file is often used to set variables and key 
bindings different from the default setting, and to define new commands that the user finds convenient. Many advanced 
users have . emacs files hundreds of lines long, with idiosyncratic custom izat ions that cause Emacs to diverge wildly 
from the default behavior. 

If a body of Emacs Lisp code is generally useful, it is often packaged as a library and distributed to other users. Many 
such third-party libraries can be found on the Internet; for example, there is a library called wikipedia-mode for editing 
Wikipedia articles. There is even a Usenet newsgroup, gnu.emacs.sources, which is used for posting new libraries. 
Some third-party libraries eventually make their way into Emacs, thus becoming a "standard" library. 

Documentation 

The first Emacs included an innovative help library that can display the documentation for every single command, 
variable, and internal function. (It may have originated this technique.) Because of this, Emacs was described as "self- 
documenting". (This term does not mean that Emacs writes its own documentation, but rather that it presents its own 
documentation to the user.) This feature makes Emacs 1 documentation very accessible. For example, the user can find 
out about the command bound to a particular keystroke simply by entering oh k (which runs the command 
describe-key), followed by the keystroke. Each function included a documentation string, specifically to be used for 
showing to the user on request. The practice of giving functions documentation strings subsequently spread to various 
programming languages such as Lisp and Java. 

The Emacs help system is useful not only for beginners, but also for advanced users writing Emacs Lisp code. If the 
documentation for a function or variable is not enough, the help system can be used to browse the Emacs Lisp source 
code for both built-in libraries and installed third-party libraries. It is therefore very convenient to program in Emacs 
Lisp using Emacs itself. 
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Apart from the built-in documentation, Emacs has an unusually long, detailed and well-written manual. An electronic 
copy of the GNU Emacs Manual, written by Richard Stallman, is included with GNU Emacs and can be viewed with 
the built-in Info browser. XEmacs has a similar manual, which forked from the GNU Emacs Manual at the same time 
as the XEmacs software. Two other manuals, the Emacs Lisp Reference Manual by Bill Lewis, Richard Stallman, and 
Dan Laliberte, and Programming in Emacs Lisp by Robert Chassell, are also included. Apart from the electronic 
versions, all three manuals are also available in book form, published by the Free Software Foundation. 

Emacs also has a built-in tutorial. When Emacs is started with no file to edit, it displays instructions for performing 
simple editing commands and invoking the tutorial. 

Internationalization 

Emacs supports the editing of text written in many human languages. There is support for many alphabets, scripts, 
writing systems, and cultural conventions. Emacs provides spell checking for many languages by calling external 
programs such as ispell. Many encoding systems, including UTF-8, are supported. XEmacs version 21.5 has partial 
Unicode support. Emacs 21 .4 has similar support; Emacs 22 will be better. All of these efforts use an Emacs-specific 
encoding internally, necessitating conversion upon load and save. UTF-8 will become the Emacs-internal encoding in 
some later version of XEmacs 21.5, and likely in Emacs 23. 

However, the Emacs user interface is in English, and has not been translated into any other language, with the 
exception of the beginners* tutorial. 

For visually impaired and blind users, there is a subsystem called Emacspeak which allows the editor to be used 
through audio feedback only. 

License 

The source code, including both the C and Emacs Lisp components, is freely available for examination, modification, 
and redistribution, under the terms of the GNU General Public License (GPL). Older versions of the GNU Emacs 
documentation were released under an ad-hoc license which required the inclusion of certain text in any modified 
copy. In the GNU Emacs user f s manual, for example, this included how to obtain GNU Emacs and Richard Stallman's 
political essay "The GNU Manifesto 11 . The XEmacs manuals, which were inherited from older GNU Emacs manuals 
when the fork occurred, have the same license. The newer versions of the GNU Emacs documentation, meanwhile, 
uses the GNU Free Documentation License and makes use of "invariant sections" to require the inclusion of the same 
documents, additionally requiring that the manuals proclaim themselves as GNU Manuals. 

Using Emacs 

Commands 

From the Unix shell, a file can be opened for editing by typing "emacs [filename]". If the filename you entered does 
not exist a file will be created with that name. For example "emacs xorg.conf ' will edit the xorg.conf file in the current 
directory, if it exists. However, Emacs documentation recommends starting Emacs without a file name, to avoid the 
bad habit of starting a separate Emacs for each file you edit. Visiting all files in a single Emacs process is the way to 
get the full benefit of Emacs. 

In the normal editing mode, Emacs behaves just like other text editors: the character keys (a, i, c, /, 2, 3, etc.) insert 
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the corresponding characters, the arrow keys move the editing point, backspace deletes text, and so forth. Other 
commands are invoked with modified keystrokes, pressing the control key and/or the meta key /alt key in conjunction 
with a regular key. Every editing command is actually a call to a function in the Emacs Lisp environment. Even a 
command as simple as typing a to insert the character a involves calling a function-in this case, self -insert- 
command. 

Some of the basic commands are shown below. More can be found at List of Emacs commands. The control key [Ctrl] 
is denoted by a capital C, and the meta or alt [Alt] key by a capital M. 



Command 


Keystroke 


Description 


forward-word 


M-f 


Move forward past one word. 


search-word 


C-s 


Search a word in the buffer. 


undo 


C-/ 


Undo last change, and prior changes if pressed repeatedly. 


keyboard-quit 


C-g 


Abort the current command. 


fill-paragraph 


M-q 


Wrap text in ("fill") a paragraph. 


find-file 


C-x C-f 


Visit a file (you specify the name) in its own editor buffer. 


save-buf f er 


C-x C-s 


Save the current editor buffer in its visited file. 


write-f ile 


C-x C-w 


Save the current editor buffer as a file with the name you specify. 


save-buf fers-kill-emacs 


C-x C-c 


Offer to save changes, then exit Emacs. 


set-marker 


C- [space] /C-@ 


Set a marker from where you want to cut or copy. 


cut 


C-w 


Cut all text between the marker and the cursor. 


copy 


M-w 


Copy all text between the marker and the cursor. 


paste 


c-y 


Paste text from the emacs clipboard 


kill buffer 


C-x k 


Kill the current buffer 



Alternatively, if a user would prefer IBM Common User Access style keys, M cua-mode" can be used. This has been a 
third-party package up to, and including, GNU Emacs 21, but is included in GNU Emacs 22 (beta). 

Note that the commands save-buf fer and save-buf fers-kill-emacs use multiple modified keystrokes. For 
example, c-x c-c means: while holding down the control key, press x\ then, while holding down the control key, press 
c. This technique, allowing more commands to be bound to the keyboard than with the use of single keystrokes alone, 
was popularized by Emacs, which got it from TECMAC, one of the TECO macro collections that immediately 
preceded Emacs. It has since made its way into modern code editors like Visual Studio. 

When Emacs is running a graphical interface, many commands can be invoked from the menubar or toolbar instead of 
using the keyboard. However, many experienced Emacs users prefer to use the keyboard because it is faster and more 
convenient once the necessary keystrokes have been memorized. 

Some Emacs commands work by invoking an external program (such as ispell for spell-checking or gcc for program 
compilation), parsing the program's output, and displaying the result in Emacs. 

Minibuffer 

The minibuffer, normally the bottommost line, is where Emacs requests information. Text to target in a search, the 
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name of a file to read or save and similar information is entered in the minibuffer. When applicable, tab completion is 
usually available. 

File management and display 

Emacs keeps text in objects called buffers. The user can create new buffers and dismiss unwanted ones, and several 
buffers can exist at the same time. Most buffers contain text loaded from text files, which the user can edit and save 
back to disk. Buffers are also used to store temporary text, such as the documentation strings displayed by the help 
library. 

In both text terminal and graphical modes, Emacs is able to split the editing area into separate sections (referred to 
since 1975 as "windows", which can be confusing on systems that have another concept of "windows" as well), so that 
more than one buffer can be displayed at a time. This has many uses. For example, one section can be used to display 
the source code of a program, while another displays the results from compiling the program. In graphical 
environments, Emacs can also launch multiple graphical-environment windows, known as "frames" in the context of 
Emacs. 

Emacs Pinky 

Because of Emacs' dependence on the modifier keys, in particular the control key is pressed with the little finger 
("pinky"), heavy Emacs users have experienced pain in their pinky fingers (see repetitive strain injury and fat-finger). 
This has been dubbed the "Emacs Pinky", and vi advocates often cite it as a reason to switch to vi. To alleviate this 
situation, many Emacs users transpose the left control key and the left caps-lock key or define both as control keys. 
There are also Kinesis's Contoured Keyboard available which reduce the strain by moving the modifier keys altogether 
so that they are in a position to be easily pushed by the thumb, and Microsoft Natural keyboard that has large modifier 
keys placed symmetrically on both sides of the keyboard so that they can be pressed with the palm. 

Distractions 

In addition to its many features, GNU Emacs includes a variety of unusual distractions designed to amuse and/or 
annoy. 

■ m-x life renders Conway's Game of Life 

■ m-x gomoku launches a game of Gomoku 

■ M-x psychoanalyze-pinhead pipes Zippy the Pinhead quotes through ELIZA Doctor 

■ M-x spook outputs a string of random words designed to distract anyone from the NSA who might be listening 
in 

See also 

■ Comparison of text editors 

■ GNU TeXmacs 

■ List of text editors 

■ List of Unix programs 
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